Searching Large Textual Dataset With Limited Computational Resources

نویسنده

  • Anagha Kulkarni
چکیده

In this paper we propose a search approach that can process large volumes of textual data efficiently and effectively even in environments where computational resources are limited. The traditional search solution for large collections assumes availability of practically unlimited computational resources. For many applications and organization this assumption is not realistic. Empirical evaluation of the proposed approach using some of the largest available datasets demonstrates that the proposed search approach is substantially more efficient than the existing approach, is on par if not better in terms of effectiveness, and can operate using very few computational resources. AUDIENCE: [Information Retrieval/Large-scale Search] [Text Processing] [Advanced technical talk]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Logic and Discovery of Textual Allusion

We describe here a method for discovering imitative textual allusions in a large collection of Classical Latin poetry. In translating the logic of literary allusion into computational terms, we include not only traditional IR variables such as token similarity and ngrams, but also incorporate a comparison of syntactic structure as well. This provides a more robust search method for Classical la...

متن کامل

A cognitive computational model of eye movements investigating visual strategies on textual material

This article presents a computational model of the visual strategies involved in processing textual material. An experiment is presented in which participants performed different tasks on a multi-paragraph page (searching a target word, searching the most relevant paragraph according to a goal, memorizing paragraphs). The proposed model predicts eye movements based on 5 parameters. The weightin...

متن کامل

Romanian Linguistic Resources On Very Large Scale

This paper suggests a methodology for building a technological environment for linguistic processing, intended to conserve, update and exploit, for research, for public and for commercial purposes, strategic linguistic resources of the Romanian language, rooted in textual data contributed daily and in the long run by important editorial houses and mass-media institutions. In essence, it describ...

متن کامل

Building a Large-Scale Repository of Textual Entailment Rules

Entailment rules are rules where the left hand side (LHS) specifies some knowledge which entails the knowledge expressed in the RHS of the rule, with some degree of confidence. Simple entailment rules can be combined in complex entailment chains, which in turn are at the basis of entailment-based reasoning, which has been recently proposed as a pervasive and application independent approach to ...

متن کامل

Integrating Textual and Model-Based Process Descriptions for Comprehensive Process Search

Documenting business processes using process models is common practice in many organizations. However, not all process information is best captured in process models. Hence, many organizations complement these models with textual descriptions that specify additional details. The problem with this supplementary use of textual descriptions is that existing techniques for automatically searching p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015